129 research outputs found
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences
Speaking rate refers to the average number of phonemes within some unit time,
while the rhythmic patterns refer to duration distributions for realizations of
different phonemes within different phonetic structures. Both are key
components of prosody in speech, which is different for different speakers.
Models like cycle-consistent adversarial network (Cycle-GAN) and variational
auto-encoder (VAE) have been successfully applied to voice conversion tasks
without parallel data. However, due to the neural network architectures and
feature vectors chosen for these approaches, the length of the predicted
utterance has to be fixed to that of the input utterance, which limits the
flexibility in mimicking the speaking rates and rhythmic patterns for the
target speaker. On the other hand, sequence-to-sequence learning model was used
to remove the above length constraint, but parallel training data are needed.
In this paper, we propose an approach utilizing sequence-to-sequence model
trained with unsupervised Cycle-GAN to perform the transformation between the
phoneme posteriorgram sequences for different speakers. In this way, the length
constraint mentioned above is removed to offer rhythm-flexible voice conversion
without requiring parallel data. Preliminary evaluation on two datasets showed
very encouraging results.Comment: 8 pages, 6 figures, Submitted to SLT 201
AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement
Speech enhancement systems are typically trained using pairs of clean and
noisy speech. In audio-visual speech enhancement (AVSE), there is not as much
ground-truth clean data available; most audio-visual datasets are collected in
real-world environments with background noise and reverberation, hampering the
development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based
audio-visual speech enhancement approach that can generate clean speech despite
the challenges of real-world training data. We obtain a subset of nearly clean
speech from an audio-visual corpus using a neural quality estimator, and then
train a diffusion model on this subset to generate waveforms conditioned on
continuous speech representations from AV-HuBERT with noise-robust training. We
use continuous rather than discrete representations to retain prosody and
speaker information. With this vocoding task alone, the model can perform
speech enhancement better than a masking-based baseline. We further fine-tune
the diffusion model on clean/noisy utterance pairs to improve the performance.
Our approach outperforms a masking-based baseline in terms of both automatic
metrics and a human listening test and is close in quality to the target speech
in the listening test. Audio samples can be found at
https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.Comment: Submitted to ICASSP 202
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Recent work on speech representation models jointly pre-trained with text has
demonstrated the potential of improving speech representations by encoding
speech and text in a shared space. In this paper, we leverage such shared
representations to address the persistent challenge of limited data
availability in spoken language understanding tasks. By employing a pre-trained
speech-text model, we find that models fine-tuned on text can be effectively
transferred to speech testing data. With as little as 1 hour of labeled speech
data, our proposed approach achieves comparable performance on spoken language
understanding tasks (specifically, sentiment analysis and named entity
recognition) when compared to previous methods using speech-only pre-trained
models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we
also analyze the latent representations. We find that the bottom layers of
speech-text models are largely task-agnostic and align speech and text
representations into a shared space, while the top layers are more
task-specific
Toward Joint Language Modeling for Speech Units and Text
Speech and text are two major forms of human language. The research community
has been focusing on mapping speech to text or vice versa for many years.
However, in the field of language modeling, very little effort has been made to
model them jointly. In light of this, we explore joint language modeling for
speech units and text. Specifically, we compare different speech tokenizers to
transform continuous speech signals into discrete units and use different
methods to construct mixed speech-text data. We introduce automatic metrics to
evaluate how well the joint LM mixes speech and text. We also fine-tune the LM
on downstream spoken language understanding (SLU) tasks with different
modalities (speech or text) and test its performance to assess the model's
learning of shared representations. Our results show that by mixing speech
units and text with our proposed mixing techniques, the joint LM improves over
a speech-only baseline on SLU tasks and shows zero-shot cross-modal
transferability.Comment: EMNLP findings 202
Proteomic analysis of rhein-induced cyt: ER stress mediates cell death in breast cancer cells
Rhein is a natural product purified from herbal plants such as Rheum palmatum, which has been shown to have anti-angiogenesis and anti-tumor metastasis properties. However, the biological effects of rhein on the behavior of breast cancers are not completely elucidated. To evaluate whether rhein might be useful in the treatment of breast cancer and its cytotoxic mechanism, we analyzed the impact of rhein treatment on differential protein expression as well as redox regulation in a non-invasive breast cancer cell line, MCF-7, and an invasive breast cancer cell line, MDA-MB-231, using lysine- and cysteine-labeling two-dimensional difference gel electrophoresis (2D-DIGE) combined with MALDI-TOF/TOF mass spectrometry. This proteomic study revealed that 73 proteins were significantly changed in protein expression; while 9 proteins were significantly altered in thiol reactivity in both MCF-7 and MDA-MB-231 cells. The results also demonstrated that rhein-induced cytotoxicity in breast cancer cells mostly involves dysregulation of cytoskeleton regulation, protein folding, the glycolysis pathway and transcription control. A further study also indicated that rhein promotes misfolding of cellular proteins as well as unbalancing of the cellular redox status leading to ER-stress. Our work shows that the current proteomic strategy offers a high-through-put platform to study the molecular mechanisms of rhein-induced cytotoxicity in breast cancer cells. The identified differentially expressed proteins might be further evaluated as potential targets in breast cancer therapy
Elevated BCRP/ABCG2 Expression Confers Acquired Resistance to Gefitinib in Wild-Type EGFR-Expressing Cells
The sensitivity of non-small cell lung cancer (NSCLC) patients to EGFR tyrosine kinase inhibitors (TKIs) is strongly associated with activating EGFR mutations. Although not as sensitive as patients harboring these mutations, some patients with wild-type EGFR (wtEGFR) remain responsive to EGFR TKIs, suggesting that the existence of unexplored mechanisms renders most of wtEGFR-expressing cancer cells insensitive.Here, we show that acquired resistance of wtEGFR-expressing cancer cells to an EGFR TKI, gefitinib, is associated with elevated expression of breast cancer resistance protein (BCRP/ABCG2), which in turn leads to gefitinib efflux from cells. In addition, BCRP/ABCG2 expression correlates with poor response to gefitinib in both cancer cell lines and lung cancer patients with wtEGFR. Co-treatment with BCRP/ABCG2 inhibitors enhanced the anti-tumor activity of gefitinib.Thus, BCRP/ABCG2 expression may be a predictor for poor efficacy of gefitinib treatment, and targeting BCRP/ABCG2 may broaden the use of gefitinib in patients with wtEGFR
Adjuvant chemotherapy and survival outcomes in rectal cancer patients with good response (ypT0-2N0) after neoadjuvant chemoradiotherapy and surgery: A retrospective nationwide analysis
BackgroundFor rectal cancer, it remains unclear how to incorporate tumor response to neoadjuvant chemoradiotherapy (nCRT) when deciding whether to give adjuvant chemotherapy. In this study, we aim to determinate the survival benefit of adjuvant chemotherapy for rectal cancer patients with good response (ypT0-2N0) after nCRT and surgery.MethodsThe study cohort included 720 rectal cancer patients who had good response (ypT0-2N0) after nCRT and surgery, who did or did not receive adjuvant chemotherapy between January 2007 and December 2017, from the Taiwan Cancer Registry and National Health Insurance Research database. The Kaplan–Meier method, log-rank tests, and Cox regression analysis were performed to investigate the effect of adjuvant chemotherapy on 5-year overall survival (OS) and disease-free survival (DFS).ResultsOf 720 patients, 368 (51.1%) received adjuvant chemotherapy and 352 (48.9%) did not. Patients who received adjuvant chemotherapy were more likely to be female, younger (≤ 65), with advanced clinical T (3-4)/N (1-2) classification and ypT2 classification. No significant difference in 5-year OS (p=0.681) or DFS (p=0.942) were observed by receipt of adjuvant chemotherapy or not. Multivariable analysis revealed adjuvant chemotherapy was not associated with better OS (adjusted hazard ratio [aHR], 1.03; 95% Confidence Interval [CI], 0.88-1.21) or DFS (aHR, 1.05; 95% CI, 0.89-1.24). Stratified analysis for OS and DFS found no significant protective effect in the use of adjuvant chemotherapy, even for those with advanced clinical T or N classification.ConclusionAdjuvant chemotherapy may be omitted in rectal cancer patients with good response (ypT0-2N0) after nCRT and surgery
- …